Skip to content

Conversation

@swombat
Copy link

@swombat swombat commented Jan 3, 2026

Summary

Adds support for Extended Thinking (also known as reasoning) across Anthropic, Gemini, and OpenAI/Grok providers. This feature exposes the model's internal reasoning process, allowing applications to access both the thinking content and the final response.

Usage

chat = RubyLLM.chat(model: 'claude-opus-4-5-20251101')
  .with_thinking(budget: :medium)  # or :low, :high, or Integer

response = chat.ask('What is 15 * 23?')
response.thinking  # => 'Let me break this down step by step...'
response.content   # => 'The answer is 345.'

# Streaming with thinking
chat.ask('Solve this') do |chunk|
  print chunk.thinking if chunk.thinking
  print chunk.content
end

Provider Support

Provider Models Implementation
Anthropic claude-opus-4-, claude-sonnet-4- thinking block with budget_tokens
Gemini gemini-2.5-, gemini-3- thinkingConfig with budget or effort level
OpenAI/Grok grok-* models reasoning_effort parameter

Budget symbols (:low, :medium, :high) are translated to appropriate provider-specific values. Integer budgets specify token counts directly.

Changes

Core:

  • Message: Added thinking and protected thinking_signature attributes
  • Chat: Added with_thinking(budget:) and thinking_enabled? methods
  • StreamAccumulator: Accumulates thinking content during streaming
  • UnsupportedFeatureError: New error for unsupported feature requests

Providers:

  • Anthropic: Full thinking support with signature for multi-turn
  • Gemini: Supports both 2.5 (budget) and 3.0 (effort level) APIs
  • OpenAI: Supports Grok models via reasoning_effort
  • Bedrock/Mistral: Accept thinking parameter (no-op for compatibility)

ActiveRecord:

  • Migration template includes thinking and thinking_signature columns
  • ChatMethods: Added with_thinking delegation and persistence
  • MessageMethods: Extracts thinking attributes in to_llm

Documentation:

  • New guide: docs/_core_features/thinking.md

Tests:

  • 82 examples covering unit and integration tests
  • VCR cassettes for claude-sonnet-4, claude-opus-4, claude-opus-4-5, and gemini-2.5-flash

Type of change

  • Bug fix
  • New feature
  • Breaking change

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
  • For provider changes: Re-recorded VCR cassettes
  • All tests pass: bundle exec rspec (736 examples, 0 failures)
  • I updated documentation
  • I didn't modify auto-generated files manually (except adding 3 models for testing)

API changes

  • Breaking change
  • New public methods/classes (with_thinking, thinking_enabled?, UnsupportedFeatureError)
  • Changed method signatures
  • No API changes

Related issues

Closes #551

## Summary

Adds support for Extended Thinking (also known as reasoning) across
Anthropic, Gemini, and OpenAI/Grok providers. This feature exposes the
model's internal reasoning process, allowing applications to access both
the thinking content and the final response.

## Usage

```ruby
chat = RubyLLM.chat(model: 'claude-opus-4-5-20251101')
  .with_thinking(budget: :medium)  # or :low, :high, or Integer

response = chat.ask('What is 15 * 23?')
response.thinking  # => 'Let me break this down step by step...'
response.content   # => 'The answer is 345.'

# Streaming with thinking
chat.ask('Solve this') do |chunk|
  print chunk.thinking if chunk.thinking
  print chunk.content
end
```

## Provider Support

- Anthropic: Uses thinking block with budget_tokens parameter
- Gemini 2.5: Uses thinkingConfig with thinkingBudget (tokens)
- Gemini 3: Uses thinkingConfig with thinkingLevel (low/medium/high)
- OpenAI/Grok: Uses reasoning_effort parameter (low/high)

Budget symbols (:low, :medium, :high) are translated to appropriate
provider-specific values. Integer budgets specify token counts directly.

## Changes

Core:
- Message: Added thinking and protected thinking_signature attributes
- Chat: Added with_thinking(budget:) and thinking_enabled? methods
- StreamAccumulator: Accumulates thinking content during streaming
- UnsupportedFeatureError: New error for unsupported feature requests

Providers:
- Anthropic: Full thinking support with signature for multi-turn
- Gemini: Supports both 2.5 (budget) and 3.0 (effort level) APIs
- OpenAI: Supports Grok models via reasoning_effort
- Bedrock/Mistral: Accept thinking parameter (no-op for compatibility)

ActiveRecord:
- Migration template includes thinking and thinking_signature columns
- ChatMethods: Added with_thinking delegation and persistence
- MessageMethods: Extracts thinking attributes in to_llm

Tests:
- 82 examples covering unit and integration tests
- VCR cassettes for claude-sonnet-4, claude-opus-4, claude-opus-4-5,
  and gemini-2.5-flash

## Type of change

- [ ] Bug fix
- [x] New feature
- [ ] Breaking change
payload
end

def grok_model?(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this. Does OpenAI provide a model called "grok"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but RubyLLM routes to Grok via OpenRouter, which uses openai/chat.rb.


AI explanation:

  1. OpenRouter inherits from OpenAI: class OpenRouter < OpenAI (line 6 in openrouter.rb)
  2. Grok models are via OpenRouter: "provider": "openrouter" in models.json
  3. No dedicated xAI provider exists

So yes, Grok API calls via OpenRouter do use openai/chat.rb because OpenRouter inherits all of OpenAI's chat logic.

The grok_model? method in openai/chat.rb is there because:

  • OpenRouter uses OpenAI-compatible API format
  • When a Grok model is detected, the reasoning_effort parameter is added for thinking support

The naming is technically correct but could be confusing.


We added a clarifying comment to the method.

swombat and others added 2 commits January 8, 2026 22:24
- Fix incorrect comment that said "OpenAI" instead of "Anthropic"
- Replace .present? with && !.empty? to avoid ActiveSupport dependency
  in core library code

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Explains why Grok model detection exists in the OpenAI provider:
Grok models are accessed via OpenRouter which inherits from OpenAI.

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Owner

@crmne crmne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this PR. It's a strong base and I really appreciate the work you put into it. I left a bunch of comments, mostly around separating effort and budget more cleanly, since some providers like Anthropic need both.

This feature is fairly high-priority for me, so unless you feel like you can iterate on it and land it over the weekend, I may take a pass at finalizing the implementation myself. That said, I’d love to keep this collaborative if you're up for it, feel free to push another update soon and we’ll get it over the line.

Thanks again! Great to see this coming together.


## What is Extended Thinking?

Extended Thinking (also known as "reasoning") is a feature that exposes the model's internal reasoning process. When enabled, models will "think through" problems step-by-step before providing their final response. This is particularly useful for:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exposes the model's internal reasoning process

Not technically true. The model you used for this PR optimized for exposing the model's internal reasoning process but really extended thinking gives the AI more time and computational "budget" to deeply analyze complex problems, break them down, plan solutions, and self-reflect before answering, significantly boosting performance on tasks like coding, math, and logic, at the expense of being slower and more costly.

It's like a multi-pass codec for video.

Comment on lines +59 to +78
### Budget Options

The `budget` parameter controls how much "thinking" the model should do:

| Budget | Description |
|--------|-------------|
| `:low` | Minimal thinking, faster responses |
| `:medium` | Balanced thinking (default) |
| `:high` | Maximum thinking, most thorough |
| Integer | Specific token budget (provider-dependent) |

```ruby
# Symbol budgets
chat.with_thinking(budget: :low)
chat.with_thinking(budget: :medium)
chat.with_thinking(budget: :high)

# Integer budget (tokens)
chat.with_thinking(budget: 10_000)
```
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think we're conflating effort and budget. Certain providers prefer effort (e.g. OpenAI), others prefer budget and have optional effort (e.g. Anthropic), therefore even a minimal interface should have both.

Comment on lines +121 to +128
chat.ask("Complex question here...") do |chunk|
thinking_content << chunk.thinking if chunk.thinking
response_content << chunk.content if chunk.content

# Update UI with separated content
update_thinking_panel(thinking_content)
update_response_panel(response_content)
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great design! doesn't break the current way of looping chunks.

Comment on lines +146 to +160
### Provider-Specific Behavior

| Provider | Models | Implementation |
|----------|--------|----------------|
| Anthropic | claude-opus-4-*, claude-sonnet-4-* | `thinking` block with `budget_tokens` |
| Gemini | gemini-2.5-*, gemini-3-* | `thinkingConfig` with budget or effort level |
| OpenAI/Grok | grok-* models | `reasoning_effort` parameter |

Budget symbols are automatically translated to provider-specific values:

| Symbol | Anthropic | Gemini 2.5 | Gemini 3 | Grok |
|--------|-----------|------------|----------|------|
| `:low` | 1,024 tokens | 1,024 tokens | "low" | "low" |
| `:medium` | 10,000 tokens | 8,192 tokens | "medium" | "high" |
| `:high` | 32,000 tokens | 24,576 tokens | "high" | "high" |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need to be rewritten taking into account the comment above

Comment on lines +202 to +209
```ruby
class AddThinkingToMessages < ActiveRecord::Migration[7.0]
def change
add_column :messages, :thinking, :text
add_column :messages, :thinking_signature, :text
end
end
```
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly better:

add_column :chunks, :thinking_text, :text
add_column :chunks, :thinking_signature, :string

Comment on lines +270 to +271
attrs[:thinking] = message.thinking if @message.has_attribute?(:thinking)
attrs[:thinking_signature] = Messages.signature_for(message) if @message.has_attribute?(:thinking_signature)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do something like this:

class Chunk < ApplicationRecord
  def thinking
    return nil unless thinking_text || thinking_signature

    OpenStruct.new(
      text: thinking_text,
      signature: thinking_signature
    )
  end
end

so we have the exact same interface between PORO and Rails

Comment on lines +17 to +18
thinking: thinking_value,
thinking_signature: thinking_signature_value,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make thinking it's own object:

module RubyLLM
  class Thinking
    attr_reader :text, :signature
    def initialize(text: nil, signature: nil)
      @text = text
      @signature = signature
    end
  end
end

Comment on lines +188 to +189
raise UnsupportedFeatureError,
"Model '#{@model.id}' does not support extended thinking"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this: this goes against our current philosophy of never stopping the user from doing something we think is wrong if the API will do the same.

Comment on lines +21 to +27
# Error raised when a feature is not supported by a model
class UnsupportedFeatureError < Error
def initialize(message)
super(nil, message)
end
end

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need.


def read_from_json(file = RubyLLM.config.model_registry_file)
data = File.exist?(file) ? File.read(file) : '[]'
data = File.exist?(file) ? File.read(file, encoding: 'UTF-8') : '[]'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Better Thinking/Thinking Streaming support

3 participants